Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting
نویسندگان
چکیده
In this work, we show that saturating output activation functions, such as the softmax, impede learning on a number of standard classification tasks. Moreover, we present results showing that the utility of softmax does not stem from the normalization, as some have speculated. In fact, the normalization makes things worse. Rather, the advantage is in the exponentiation of error gradients. This exponential gradient boosting is shown to speed up convergence and improve generalization. To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012. In the latter case, using the stateof-the-art neural network architecture, the model converged 33% faster with our method than with the standard softmax activation, and that with a slightly better performance to boot.
منابع مشابه
Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network
Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...
متن کاملWavelet-based gradient boosting
A new data science tool named wavelet-based gradient boosting is proposed and tested. The approach is special case of componentwise linear least squares gradient boosting, and involves wavelet functions of the original predictors.Wavelet-based gradient boosting takes advantages of the approximate 1 penalization induced by gradient boosting to give appropriate penalized additive fits. The method...
متن کاملLinear and Nonlinear Trading Models with Gradient Boosted Random Forests and Application to Singapore Stock Market
This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allo...
متن کاملPredictive Risk Mapping of Leptospirosis for North of Iran Using Pseudo-absences Data
Leptospirosis is a common zoonosis disease with a high prevalence in the world and is recognized as an important public health drawback in both developing and developed countries owing to epidemics and increasing prevalence. Because of the high diversity of hosts that are capable of carrying the causative agent, this disease has an expansive geographical reach. Various environmental and social ...
متن کاملIclr 2017 C Ategorical R Eparameterization with G Umbel - S Oftmax
Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a nove...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.04199 شماره
صفحات -
تاریخ انتشار 2017